-
Notifications
You must be signed in to change notification settings - Fork 785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up the substring
kernel by about 2x
#1512
Conversation
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
offsets | ||
.windows(2) | ||
.map(|pair| (pair[1] + start).max(pair[0])) | ||
.collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to do new_starts
/new_length
in one iteration over offsets
and use unzip
to materialize into two Vecs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it, I think actually materializing new_length
to a Vec is not needed. It probably is faster (and saves allocating this Vec) to compute the length again when calculating new_values
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to remove .collect
, actually. But you know, closure values in if-else
branch can get no two closures, even if identical, have the same type
error. And I did not find a good solution except adding .collect
rust-lang/rust#87961
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is not possible in Rust (I think only by Boxing or manually inlining / macros).
Calculating the length in two places with something like let length = *end - *start
within both the new_offsets
new_values
doesn´t seem too bad in terms of duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is not possible in Rust (I think only by Boxing or manually inlining / macros).
Calculating the length in two places with something like
let length = *end - *start
within both thenew_offsets
new_values
doesn´t seem too bad in terms of duplication?
One way is to allow the closure not to capture the environment by adding more parameters, and the compiler will downcast closure to function. This works for calculating new_starts
, but doesn't work for new_length
where we must capture Some(length) = length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My later comment mentioned not allocating the new vec but not allocating the array at all (by moving the calculation to the iteration in new_values
). Is that something you want to try?
Thinking about it, I think actually materializing new_length to a Vec is not needed. It probably is faster (and saves allocating this Vec) to compute the length again when calculating new_values.
let length_i: OffsetSize = offsets[i + 1] - offsets[i]; | ||
// compute where we should start slicing this entry | ||
let start = offsets[i] | ||
+ if start >= OffsetSize::zero() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to compare start
with zero
multiple time. So I move this outside the for loop
let start = start.max(offsets[i]).min(offsets[i + 1]); | ||
// compute the length of the slice | ||
let length: OffsetSize = length | ||
.unwrap_or(length_i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reason! we don't need to unwrap
length
in each loop. So move this outside the loop.
.map(|(start, length)| { | ||
(start.to_usize().unwrap(), length.to_usize().unwrap()) | ||
}) | ||
.map(|(start, length)| &data[start..start + length]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.map(|(start, length)| { | |
(start.to_usize().unwrap(), length.to_usize().unwrap()) | |
}) | |
.map(|(start, length)| &data[start..start + length]) | |
.map(|(start, length)| { | |
let start = start.to_usize().unwrap(); | |
let length = length.to_usize().unwrap(); | |
&data[start..start + length] | |
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #1512 +/- ##
==========================================
+ Coverage 82.72% 82.75% +0.02%
==========================================
Files 189 190 +1
Lines 54561 54707 +146
==========================================
+ Hits 45137 45274 +137
- Misses 9424 9433 +9
Continue to review full report at Codecov.
|
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
Hi @Dandandan ! These are some changes since your last review:
|
length_i + start | ||
}; | ||
let cal_new_start: Box<dyn Fn(OffsetSize, OffsetSize) -> OffsetSize> = if start | ||
>= zero |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cargo fmt
likes this weird format.
length_so_far += length; | ||
// start and end offsets for each substring | ||
let mut new_starts_ends: Vec<(OffsetSize, OffsetSize)> = | ||
Vec::with_capacity(array.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't materialize new_starts_ends
here, we have to calculate them twice when calculating new_values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a time-space trade-off, but not the major factor affecting performance
|
||
new_offsets.push(length_so_far); | ||
offsets.windows(2).for_each(|pair| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Purely functional style is not appropriate because new_offsets
and new_starts_ends
have different number of elements. After trying some different ways, I find that mut vec
+ for each
is more readable and efficient.
Hi @Dandandan ! What's your opinion? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @HaoYang670
I spent some time reading this code but I ran out of time to properly review it. I am worried there are corner cases in here that might not be covered by tests -- did you review test coverage and is this properly covered?
I think some comments might help me understand the code more quickly as well.
I will try and find some more time later this week or next to review it more
Thank you very much @alamb. I have reviewed the Codedev report and all code branches have been covered. |
substring
kernelsubstring
kernel by about 2x
Thanks @HaoYang670 and @Dandandan |
Which issue does this PR close?
Closes #1511.
Rationale for this change
Improve the performance and do some clean-up work.
What changes are included in this PR?
new_offsets
first to estimate the memory needed for bufferBenchmark
chip: Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz
About 2X speed-up
Are there any user-facing changes?
No.